As the ENTS features that have been extracted in this notebook are having some problems it may be worthwhile to integrate the predictions ENTS makes about the human genome into the classifier. This is done using a file containing the predictions of the ENTS classifier over the human genome:
In [1]:
cd ../../ents/
In [8]:
ls
In [10]:
!head 9606_0.50_predictions
These are simply Ensembl pairs with corresponding confidence values in the interactions existing. Using a method similar to the at used to extract the STRING summary features we can make an object to return these values in our feature vector assembler. The first step is to load the dictionary mapping between Entrez and Ensembl IDs:
In [11]:
cd ../geneconversion/
In [12]:
import pickle
In [13]:
f = open("human.gene2ensemble.pickle")
gene2ensembl = pickle.load(f)
f.close()
As before, invert this dictionary:
In [14]:
ensembl2gene = {}
for k in gene2ensembl:
try:
for p in gene2ensembl[k]:
ensembl2gene[p] += [k]
except KeyError:
for p in gene2ensembl[k]:
ensembl2gene[p] = [k]
In [15]:
cd ../ents/
As before, build a dictionary mapping Entrez Gene Pairs as frozensets to these prediction values:
In [16]:
import csv
import itertools
In [19]:
import pdb
In [22]:
f = open("9606_0.50_predictions")
c = csv.reader(f, delimiter="\t")
# no header this time
entsdict = {}
# iterate over rows building dictionary:
for l in c:
#first build the (possibly various) keys
try:
geneids1 = ensembl2gene[l[0]]
geneids2 = ensembl2gene[l[1]]
except KeyError:
#pdb.set_trace()
#give up on pair if they can't be mapped to Entrez
continue
#then iterate over their combinations saving the feature vector each entry
for i1,i2 in itertools.product(geneids1,geneids2):
entsdict[frozenset([i1,i2])] = l[2]
f.close()
Then we just import the class we used to save the STRING results based feature again, instantiate it and pickle it:
In [24]:
import sys
In [25]:
sys.path.append("../opencast-bio/")
In [26]:
import ocbio.ppipred
In [27]:
entsfeatures = ocbio.ppipred.features(entsdict,1)
In [28]:
f = open("human.Entrez.ENTS.summary.pickle","wb")
pickle.dump(entsfeatures,f)
f.close()